Coding Exon Detection Using Comparative Sequences
نویسندگان
چکیده
We introduce a new system, called shortHMM, for predicting exons, which predicts individual exons using two related genomes. In this system, we build a hidden semi-Markov model to identify exons. In the hidden Markov model, we propose joint probability models of nucleotides in introns, splice sites, 5'UTR, 3'UTR, and intergenic regions by exploiting the homology between related genomes. In order to reduce the false positive rate of the hidden Markov model, we develop a screening process which is able to identify intergenic regions. We then build a classifier by combining the statistics from the hidden Markov model and the screening process. We implement shortHMM on human-mouse sequence alignments. The source codes are available at < www.stat.purdue.edu/ jingwu/hmm >. Compared to TWINSCAN and SLAM, shortHMM is substantially more powerful in identifying AT-rich RefSeq exons (8% more AT-rich RefSeq exons were predicted), as well as slightly more powerful in identifying RefSeq exons (3-10% more RefSeq exons were predicted), at a similar or lower false positive rate, with less computing time and with less memory usage. Last, shortHMM is also capable of finding new potential exons.
منابع مشابه
Enhanced restriction site mutation (RSM) analysis of 1,2-dimethylhydrazine induced mutations, using endogenous p53 intron sequences.
The restriction site mutation (RSM) assay was used to study the mutational sensitivities of three target regions of the murine p53 gene. The non-coding intron 6 target region was compared with the coding regions exon 4 and exon 5 with respect to their relative sensitivity to the induction of mutations by 1,2-dimethylhydrazine (DMH). Our results demonstrated that the majority of induced mutation...
متن کاملTraffic Scene Analysis using Hierarchical Sparse Topical Coding
Analyzing motion patterns in traffic videos can be exploited directly to generate high-level descriptions of the video contents. Such descriptions may further be employed in different traffic applications such as traffic phase detection and abnormal event detection. One of the most recent and successful unsupervised methods for complex traffic scene analysis is based on topic models. In this pa...
متن کاملComparison of Gene and Exon Prediction Techniques for Detection of Short Coding Regions
School of Electrical Engineering and Telecommunications The University of New South Wales, Sydney 2052, Australia [email protected] ckground: The segments of DNA molecule, called genes are known to carry useful information their protein coding regions (exons) and are responsible for protein synthesis. In eukaryotes, exon ions are separated by non-coding regions (introns), where...
متن کاملOn Gene Prediction by Cross-Species Comparative Sequence Analysis
Sequencing of large fragments of genomic DNA makes it possible to perform comparisons of genomic sequences for identification of protein-coding regions. We have conducted a comparative analysis of homologous genomic sequences of organisms with different evolutionary distances and determined the degree of conservation of the non-coding regions between closely related organisms. In contrast, more...
متن کاملSynonymous mutations in CFTR exon 12 affect splicing and are not neutral in evolution.
It is well established that exonic sequences contain regulatory elements of splicing that overlap with coding capacity. However, the conflict between ensuring splicing efficiency and preserving the coding capacity for an optimal protein during evolution has not been specifically analyzed. In fact, studies on genomic variability in fields as diverse as clinical genetics and molecular evolution m...
متن کاملCoding elements in exons 2 and 3 target c-myc mRNA downregulation during myogenic differentiation.
Downregulation in expression of the c-myc proto-oncogene is an early molecular event in differentiation of murine C2C12 myoblasts into multinucleated myotubes. During differentiation, levels of c-myc mRNA decrease 3- to 10-fold despite a lack of change in its transcription rate. To identify cis-acting elements that target c-myc mRNA for downregulation during myogenesis, we stably transfected C2...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of computational biology : a journal of computational molecular cell biology
دوره 13 6 شماره
صفحات -
تاریخ انتشار 2006